53 research outputs found
Composition of Constraint, Hypothesis and Error Models to improve interaction in Human-Machine Interfaces
We use Weighted Finite-State Transducers (WFSTs) to represent the different sources of information available: the initial hypotheses, the possible errors, the constraints imposed by the task (interaction language) and the user input. The fusion of these models to find the most probable output string can be performed efficiently by using carefully selected transducer operations. The proposed system initially suggests an output based on the set of hypotheses, possible errors and Constraint Models. Then, if human intervention is needed, a multimodal approach, where the user input is combined with the aforementioned models, is applied to produce, with a minimum user effort, the desired output. This approach offers the practical advantages of a de-coupled model (e.g. input-system + parameterized rules + post-processor), keeping at the same time the error-recovery power of an integrated approach, where all the steps of the process are performed in the same formal machine (as in a typical HMM in speech recognition) to avoid that an error at a given step remains unrecoverable in the subsequent steps. After a presentation of the theoretical basis of the proposed multi-source information system, its application to two real world problems, as an example of the possibilities of this architecture, is addressed. The experimental results obtained demonstrate that a significant user effort can be saved when using the proposed procedure. A simple demonstration, to better understand and evaluate the proposed system, is available on the web https://demos.iti.upv.es/hi/. (C) 2015 Elsevier B.V. All rights reserved.Navarro Cerdan, JR.; Llobet Azpitarte, R.; Arlandis, J.; Perez-Cortes, J. (2016). Composition of Constraint, Hypothesis and Error Models to improve interaction in Human-Machine Interfaces. Information Fusion. 29:1-13. doi:10.1016/j.inffus.2015.09.001S1132
Efficient search in hidden text of large DjVu documents
The paper describes an open-source tool which allows to present
end-users with results of advanced language technologies. It
relies on the DjVu format, which for some applications is still
superior to other modern formats including PDF/A. The DjVu GPLed
tools are not limited just to the DjVuLibre library, but are being
supplemented by various new programs, such as
pdf2djvu developed by Jakub Wilk. It allows in
particular to convert to DjVu the PDF output of popular OCR programs
like FineReader preserving the hidden text layer and some other
features.
The tool in question has been conceived by the present author and
consist of a modification of the Poliqarp corpus query tool,
used for National Corpus of Polish; his ideas have been very
succesfully implemented by Jakub Wilk. The new system, called here
simply Poliqarp for DjVu, inherits from its origin not only the
powerfull search facilities based on two-level regular expressions, but
also the ability to represent low-level ambiguities and other
linguistic phenomena. Although at present the tool is used mainly
to facilitate access to the results of dirty OCR, it is ready to
handle also more sophisticated output of linguistic technologies
IJDAR DOI 10.1007/s10032-011-0176-2 ORIGINAL PAPER
Coupled snakelets for curled text-line segmentation from warpe
Welcome from the program chairs: ICDAR 2011
10.1109/ICDAR.2011.6Proceedings of the International Conference on Document Analysis and Recognition, ICDARxxix
Efficient search in hidden text of large DjVu documents
The paper describes an open-source tool which allows to present
end-users with results of advanced language technologies. It
relies on the DjVu format, which for some applications is still
superior to other modern formats including PDF/A. The DjVu GPLed
tools are not limited just to the DjVuLibre library, but are being
supplemented by various new programs, such as
pdf2djvu developed by Jakub Wilk. It allows in
particular to convert to DjVu the PDF output of popular OCR programs
like FineReader preserving the hidden text layer and some other
features.
The tool in question has been conceived by the present author and
consist of a modification of the Poliqarp corpus query tool,
used for National Corpus of Polish; his ideas have been very
succesfully implemented by Jakub Wilk. The new system, called here
simply Poliqarp for DjVu, inherits from its origin not only the
powerfull search facilities based on two-level regular expressions, but
also the ability to represent low-level ambiguities and other
linguistic phenomena. Although at present the tool is used mainly
to facilitate access to the results of dirty OCR, it is ready to
handle also more sophisticated output of linguistic technologies
Topic models for semantics-preserving video compression
Most state-of-the-art systems for content-based video understanding tasks require video content to be represented as collections of many low-level descriptors, e.g. as histograms of the color, texture or motion in local image regions. In order to preserve as much of the information contained in the original video as possible, these representations are typically high-dimensional, which conflicts with the aim for compact descriptors that would allow better efficiency and lower storage requirements. In this paper, we address the problem of semantic compression of video, i.e. the reduction of low-level descriptors to a small number of dimensions while preserving most of the semantic information. For this, we adapt topic models - which have previously been used as compact representations of still images - to take into account the temporal structure of a video, as well as multi-modal components such as motion information. Experiments on a large-scale collection of YouTube videos show that we can achieve a compression ratio of 20 : 1 compared to ordinary histogram representations and at least 2 : 1 compared to other dimensionality reduction techniques without significant loss of prediction accuracy. Also, improvements are demonstrated for our video-specific extensions modeling temporal structure and multiple modalities
- …